Maracaibo
A Psychology-based Unified Dynamic Framework for Curriculum Learning
Meng, Guangyu, Zeng, Qingkai, Lalor, John P., Yu, Hong
Directly learning from examples of random difficulty levels is often challenging for both humans and machine learning models. A more effective strategy involves exposing learners to examples in a progressive order, from easy to difficult. Curriculum Learning (CL) has been proposed to implement this strategy in machine learning model training. However, two key challenges persist in CL framework design: defining the difficulty of training data and determining the appropriate amount of data to input at each training step. This paper presents a Psychology-based Unified Dynamic Framework for Curriculum Learning (PUDF), drawing inspiration from psychometrics. We quantify the difficulty of training data by applying Item Response Theory (IRT) to responses from Artificial Crowds (AC). This theory-driven IRT-AC approach leads to global (i.e., model-independent) and interpretable difficulty values. Leveraging IRT, we propose a Dynamic Data Selection via Model Ability Estimation (DDS-MAE) strategy to schedule the appropriate amount of data during model training. Since our difficulty labeling and model ability estimation are based on a consistent theory, namely IRT, their values are comparable within the same scope, potentially leading to a faster convergence compared to the other CL methods. Experimental results demonstrate that fine-tuning pre-trained language models with PUDF enhances their performance on the GLUE benchmark. Moreover, PUDF surpasses other state-of-the-art (SOTA) CL methods on the GLUE benchmark. We further explore the components of PUDF, namely the difficulty measurer (IRT-AC) and the training scheduler (DDS-MAE) qualitatively and quantitatively. Lastly, we conduct an ablation study to clarify which components of PUDF contribute to faster convergence and higher accuracy.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Holy See > Vatican City (0.04)
- (12 more...)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Education (1.00)
- Energy (0.93)
- Media > Music (0.67)
- (6 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation
Song, Yixiao, Kim, Yekyung, Iyyer, Mohit
Existing metrics for evaluating the factuality of long-form text, such as FACTSCORE (Min et al., 2023) and SAFE (Wei et al., 2024), decompose an input text into "atomic claims" and verify each against a knowledge base like Wikipedia. These metrics are not suitable for most generation tasks because they assume that every claim is verifiable (i.e., can plausibly be proven true or false). We address this issue with VERISCORE, a metric for diverse long-form generation tasks that contain both verifiable and unverifiable content. VERISCORE can be effectively implemented with either closed or fine-tuned open-weight language models, and human evaluation confirms that VERISCORE's extracted claims are more sensible than those from competing methods across eight different long-form tasks. We use VERISCORE to evaluate generations from 16 different models across multiple long-form tasks and find that while GPT-4o is the best-performing model overall, open-weight models such as Mixtral-8x22 are closing the gap. We show that an LM's VERISCORE on one task (e.g., biography generation) does not necessarily correlate to its VERISCORE on a different task (e.g., long-form QA), highlighting the need for expanding factuality evaluation across tasks with varying fact density.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Russia (0.04)
- (17 more...)
- Health & Medicine (0.93)
- Energy (0.67)
RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Zhao, Yilun, Zhao, Chen, Nan, Linyong, Qi, Zhenting, Zhang, Wenlin, Tang, Xiangru, Mi, Boyu, Radev, Dragomir
Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. To systematically study the robustness of Table QA models, we propose a benchmark called RobuT, which builds upon existing Table QA datasets (WTQ, WikiSQL-Weak, and SQA) and includes human-annotated adversarial perturbations in terms of table header, table content, and question. Our results indicate that both state-of-the-art Table QA models and large language models (e.g., GPT-3) with few-shot learning falter in these adversarial sets. We propose to address this problem by using large language models to generate adversarial examples to enhance training, which significantly improves the robustness of Table QA models. Our data and code is publicly available at https://github.com/yilunzhao/RobuT.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (25 more...)
Regionalized models for Spanish language variations based on Twitter
Tellez, Eric S., Moctezuma, Daniela, Miranda, Sabino, Graff, Mario, Ruiz, Guillermo
Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and spoken in the same way in different countries. Understanding local language variations can help to improve model performances on regional tasks, both understanding local structures and also improving the message's content. For instance, think about a machine learning engineer who automatizes some language classification task on a particular region or a social scientist trying to understand a regional event with echoes on social media; both can take advantage of dialect-based language models to understand what is happening with more contextual information hence more precision. This manuscript presents and describes a set of regionalized resources for the Spanish language built on four-year Twitter public messages geotagged in 26 Spanish-speaking countries. We introduce word embeddings based on FastText, language models based on BERT, and per-region sample corpora. We also provide a broad comparison among regions covering lexical and semantical similarities; as well as examples of using regional resources on message classification tasks.
- North America > United States (0.14)
- South America > Argentina (0.05)
- North America > Cuba (0.04)
- (35 more...)
- Information Technology > Services (0.93)
- Health & Medicine (0.68)
The Venezuelans Trying to Escape Their Country Through Video Game Grunt Work
On a recent afternoon in Maracaibo, Venezuela, Alexander Marinez, who has short-cropped black hair and three-to-four-day stubble, sat in front of his computer tracking herbiboars in the mushroom forests on Fossil Island. He pressed down on his glowing mouse, the newest addition to his otherwise timeworn gaming setup. The pixelated character on his computer screen followed the tracks of a hedgehoglike creature with triangular tusks and herbs growing out of its back. Outside Marinez's one-story house, the sun bore down on the dirt road. His home lies about six miles away from the strait that connects the Caribbean Sea with Lake Maracaibo, one of the world's richest sources of oil. The character inspected a tunnel. Suddenly, the herbiboar appeared, and the character attacked, stunning it.
- South America > Venezuela > Zulia State > Maracaibo (0.46)
- Atlantic Ocean > Caribbean Sea (0.25)
- South America > Venezuela > Lake Maracaibo (0.24)
- (13 more...)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Government (1.00)
- Banking & Finance (1.00)
- Information Technology > Communications (0.95)
- Information Technology > Artificial Intelligence > Games (0.51)
This is Artificial Intelligence's dirty little secret Gadgets Now
SAN FRANCISCO: There's a dirty little secret about artificial intelligence: It's powered by hundreds of thousands of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework _drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into machine learning'' algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world _ even in the U.S.
- Asia > India (0.27)
- North America > United States > California > San Francisco County > San Francisco (0.25)
- South America > Venezuela > Zulia State > Maracaibo (0.05)
- (5 more...)
- Information Technology (1.00)
- Transportation > Passenger (0.51)
- Transportation > Ground > Road (0.51)
- Consumer Products & Services > Hotels (0.49)
Artificial intelligence has a dirty little secret: It's powered by people
There's a dirty little secret about artificial intelligence: It's powered by an army of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework -- drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world -- even in the U.S. And it underpins a technology that could change humanity forever: AI that will drive us around, execute verbal commands without flaw, and -- possibly -- one day think on its own.
- Asia > India (0.26)
- North America > United States > Massachusetts > Norfolk County > Franklin (0.06)
- South America > Venezuela > Zulia State > Maracaibo (0.05)
- (4 more...)
- Consumer Products & Services > Hotels (0.50)
- Transportation > Ground > Road (0.37)
Real people do much of 'artificial intelligence' work
There's a dirty little secret about artificial intelligence: It's powered by an army of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework -- drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world -- even in the U.S.
- Asia > India (0.27)
- North America > United States > Massachusetts > Norfolk County > Franklin (0.06)
- South America > Venezuela > Zulia State > Maracaibo (0.05)
- (6 more...)
- Information Technology (1.00)
- Transportation > Passenger (0.51)
- Transportation > Ground > Road (0.51)
- Consumer Products & Services > Hotels (0.49)
AI's dirty little secret
San Francisco - There's a dirty little secret about artificial intelligence: It's powered by an army of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework - drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world - even in the US.
- Asia > India (0.26)
- North America > United States > California > San Francisco County > San Francisco (0.25)
- South America > Venezuela > Zulia State > Maracaibo (0.05)
- (3 more...)
- Consumer Products & Services > Hotels (0.50)
- Transportation > Ground > Road (0.37)
Wonder the taskforce behind AI? It's humans
There's a dirty little secret about artificial intelligence: It's powered by hundreds of thousands of real people. From makeup artists in Venezuela to women in conservative parts of India, people around the world are doing the digital equivalent of needlework -- drawing boxes around cars in street photos, tagging images, and transcribing snatches of speech that computers can't quite make out. Such data feeds directly into "machine learning" algorithms that help self-driving cars wind through traffic and let Alexa figure out that you want the lights on. These repetitive tasks pay pennies apiece. But in bulk, this work can offer a decent wage in many parts of the world -- even in the US.
- Asia > India (0.27)
- South America > Venezuela > Zulia State > Maracaibo (0.05)
- North America > United States > Massachusetts > Norfolk County > Franklin (0.05)
- (4 more...)
- Information Technology (1.00)
- Transportation > Ground > Road (0.52)
- Transportation > Passenger (0.51)
- Consumer Products & Services > Hotels (0.49)